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Abstract 

We demonstrate that a common-line method can assemble a 3D oversampled 
diffracted intensity distribution suitable for high-resolution structure solution 
from a set of measured 2D diffraction patterns, as proposed in experiments with 
an X-ray free electron laser (XFEL) (Neutze et al., 2000). Even for a flat Ewald 
sphere, we show how the ambiguities due to Priedel's Law may be overcome. 
The method breaks down for photon counts below about 10 per detector pixel, 
almost 3 orders of magnitude higher than expected for scattering by a 500 
kDa protein with an XFEL beam focused to a 0.1 ^m diameter spot. Even if 
10^ orientationally similar diffraction patterns could be identified and added to 
reach the requisite photon count per pixel, the need for about 10^ orientational 
classes for high-resolution structure determination suggests that about ~ 10^ 
diffraction patterns must be recorded. Assuming pulse and read-out rates of 
100 Hz, such measurements would require ~ 10'' seconds, i.e. several months 
of continuous beam time. 
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1 Introduction 



X-ray crystallography is one of the key contributions of the physical sciences to the 
life sciences. Its application to biological, biochemical, and pharmaceutical prob- 
lems continues to enable breakthroughs (Cramer et al., 2001; Gnatt et al., 2001) 
highlighting the importance of structure to function. However, roughly 40% of bio- 
logical molecules do not crystallize, and many cannot easily be purified. These factors 
severely limit the applicability of X-ray crystallography; although more than 750,000 
proteins have been sequenced, the structures of less than 10% have been determined 
to high resolution (Protein Data Bank, http:/ /www. pdh.org). The ability to determine 
the structure of individual biological molecules - without the need for purification and 
crystallization - would constitute a fundamental breakthrough. 

The confluence of flve factors has generated intense interest in single-molecule 
crystallography by short-pulse X-ray scattering: a) The advent of algorithms for de- 
termining phases from measured diffraction intensities by successive and repeated 
application of constraints in real and reciprocal spaces (see e.g. Ficnup, 1978; Elser, 
2003; Millane, 2003), with demonstrations in astronomy (Fienup, 1982); diffractive 
imaging of nanoparticles (Williams et al., 2003; Wu et al., 2005; Chapman et al., 

2006) , biological cells (Shapiro et al., 2005; Thibault et al., 2006); small molecule 
crystallography (Oszlanyi and Siito, (2003); Wu et al. (2004a); surface crystallogra- 
phy (Kumpf et al., 2001; Fung et al., 2007); and protein crystallography (Miao et al., 
2001; Spence et al., 2005); b) Development of sophisticated techniques for determin- 
ing the relative orientation of electron microscope images of biological entities, such as 
cells and large macromolecules (see e.g. Prank, 2006); c) Development of techniques 
for producing beams of hydrated proteins by electrospraying or Raleigh-droplet for- 
mation (Penn, 2002; Spence et al., 2005); d) The promise of very bright, ultra-short 
pulses of hard X-rays from X-ray Free Electron Lasers (XPELs) under construction in 
the US, Japan, and Europe (Normille, 2006); e) The prospect of overcoming the limits 
to achievable resolution due to radiation damage by using short pulses of radiation 
(Solem and Baldwin, 1982; Neutze et al, 2000). 

It has been suggested (Neutze et al., 2000; Hajdu et al., 2000; Abela et al., 

2007) that an experiment to determine the structure of a biological molecule might, in 
principle, proceed as follows: i) A train of individual hydrated proteins is exposed to a 
synchronized train of intense X-ray pulses. As a single pulse is sufficient to destroy the 
molecule, the pulses (and data collection) must be short compared with the roughly 
50 fs needed for the molecular constituents to fly apart (Neutze et al., 2000; Jurek 
et al., 2004). ii) The two-dimensional (2D) diffraction patterns obtained with single 
pulses are read out, each pattern corresponding to an unknown, random orientation 
of the molecule, iii) The relative orientations of the molecule corresponding to 2D 
diffraction patterns (and hence the relative orientations of each diffraction pattern 
in 3D reciprocal space) are determined, iv) A noise- averaged 3D diffracted intensity 
distribution is constructed, v) The structure of the molecule is determined from 
the diffracted intensity distribution by an iterative "phasing algorithm" (Miao et al.. 
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2001). 

As pointed out by Huldt et al., (2003), for this approach to succeed in principle, 
it is necessary to develop a noise-robust algorithm to determine the relative orienta- 
tions of diffraction patterns obtained from randomly-oriented individual molecules, 
to reconstruct the 3D diffracted intensity distribution of sufficient quality, and to 
determine the secondary structure of individual biological molecules. 

In brief, starting with a collection of noisy 2D diffraction patterns of unknown 
orientation, such a method recovers the 3D electron density of a molecule, providing 
a quantitative measure of the reliability of the reconstruction. It has been suggested 
that an algorithm developed for the analogous problem of the reconstructing a 3D 
image of a large molecule or nanoparticle from different projected electron microscope 
images, the method of common lines, may be employed for this task. We investigate 
the capabilities and limitations of such an approach for structure recovery from sim- 
ulated short- wavelength diffraction patterns of a small (10-residue) synthetic protein, 
Chignolin (Protein Data Bank Entry lUAO). 

Starting with 630 simulated, noise-free 2D diffraction patterns of 0.1 A wave- 
length X-rays from random orientations of the molecule, we show that such an algo- 
rithm is able to recover the electron density distribution of the (small) test protein 
molecule, Chignolin, up to about 1 A resolution with a fidelity measured by a correla- 
tion coefficient of 0.7 between the model and recovered electron density distributions. 
This constitutes the first demonstration of an integrated algorithm able to perform 
all the tasks necessary to extract a molecular electron density from a set of 2D diffrac- 
tion patterns of random unknown orientations. We have also investigated the limits 
of the algorithm with respect to shot noise (modeled by Poisson statistics) in the 
detected signal. Our results show that the common-line method requires at least 10 
photons/pixel. This is at least two orders of magnitude higher than the anticipated 
signal levels from the LCLS XFEL currently under construction. 

The algorithm consists of three primary modules: a) Determination of the rel- 
ative orientations of the measured diffraction patterns in 3D reciprocal space; b) Prom 
the resulting irregular distribution of diffracted intensities, application of a gridding 
algorithm to generate data on a uniform rectilinear grid in reciprocal space; c) Appli- 
cation to this gridded data of a 3D iterative algorithm to find the phases associated 
with the grid intensities, and recover the 3D electron density of the molecule. Por 
an experiment which provided independent information about the orientations of the 
sample, steps b) and c) were previously implemented by Chapman et al. (2006). 

2 Determination of the Relative Orientations of 
the Diffraction Patterns 

In the following, we assume that the X-ray energy is high enough, and the solid angle 
subtended by the diffraction pattern at the sample small enough that it is a reasonable 
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approximation to consider each diffraction pattern as a planar central section through 
the 3D reciprocal space of the molecule. In practice, this is valid for X-ray wavelengths 
of about 0.1 A. Then, the problem reduces to determining the relative orientations 
of these planar sections from the data in the diffraction patterns alone without any 
knowledge of the structure of the molecule. For longer wavelengths, as pointed out 
previously (e.g. Huldt et al., 2003, Chapman, 2007), it will be necessary to take 
account of the curvature of the Ewald sphere, when the common lines become arcs 
of a circle rather than straight lines. The extra complexity of identifying such arcs 
may be offset by two factors: (1) the avoidance of ambiguities stemming from the 
duplication of intensities in the same 2D diffraction pattern, due to Priedel's Law 
(see below); and (2) the possibility of determining all three Euler angles relating 
the orientations of any two diffraction patterns from just their mutual common line 
(Huldt, et ai, 2003), if the extra parameter of the radius of curvature of the common 
arcs may be determined with sufficient accuracy. 

Our approach is inspired by the analogous problem of reconstructing the 3D 
structure of a macromolecule or nanoparticle from electron microscope images rep- 
resenting projections of copies of the object along random directions (Prank, 2006). 
This problem has been solved by exploiting the central section theorem (Farrow and 
Ottensmeyer, 1992), and has been developed most notably by these authors and also 
previously by van Heel (1987); and Goncharov et al., (1987). From the projection- 
slice theorem, the Fourier transform of a 2D (i.e. projected) image is a central slice 
through the complex 3D reciprocal space of the 3D object. Any two central sections 
intersect along a hne. This allows partial ahgnment of the two central sections with 
respect to each other. Specifically, determination of the gradients of this common line 
relative to, say, 2D Cartesian coordinate systems in the planes of each of the central 
sections allows two of the three Euler angles specifying the relative orientations of 
these central sections to be deduced. Using this procedure, it is generally possible to 
determine six of the nine interplanar Euler angles between three independent diffrac- 
tion patterns. Knowledge of these six Euler angles allows the remaining three to be 
deduced by geometrical construction. 

We point to one important difference between an application of the common- 
line approach to images (as in 3D electron microscopy) and diffraction patterns. The 
images constitute projections of the object in real space. Some apphcations of the 
central section theorem have been performed in reciprocal space, exploiting the fact 
that 2D Fourier transforms of these images yield moduli and phases of complex ampli- 
tudes on central sections through reciprocal space. Sinograms of the data of any two 
diffraction patterns allow the unique identification of a pair of Euler angles relating 
the two central sections in 3D reciprocal space (Frank, 2006). In contrast, in our 
problem, the raw experimental data are diffracted intensities, and direct information 
is available only about the moduli of the complex amplitudes in reciprocal space. 

Friedel's Law of crystallography suggests that the intensity distribution along 
a radial line through the center of each diffraction pattern is the same as one rotated 
relative to it by ISO". This means that, for a flat Ewald sphere, the determination of 
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Figure 1: Transformation of central section PI into P2, by rotation through Euler 
angles ^12, Q12, and ^! 12. 

the pair of Euler angles from common lines is uncertain by ± 180°. Any significant 
curvature of the Ewald sphere removes this ambiguity, but even for a flat Ewald 
sphere, this uncertainty may be resolved through consistency conditions amongst the 
Euler angles, as shown below. 

2.1 Determination of two Euler angles between two inter- 
secting diffraction patterns 

Fig. 1 illustrates the reciprocal-space geometry of two central sections, representing 
diffraction patterns, PI and P2. Let the Euler angles relating PI and P2 be $12, 
012, and \l/12. Consider three Cartesian axes X, Y, and Z, where X and Y lie in the 
plane of PI, and Z is normal to it. Diffraction pattern P2 is related to PI by a set of 
three rotations. The initial rotation is through the azimuthal angle $12 about the Z 
axis. Next follows a rotation through 012 about the X axis obtained after the first 
rotation. Let us denote this axis by C12. The final rotation is through \E'12 about the 
new Z axis, denoted Z'. It is clear from the figure that C12 is the line of intersection 
between PI and P2, i.e the common line. 

The orientation of the common line C12 relative to the (X,Y) Cartesian axes 
in the plane of PI is shown in Fig. 2. The gradient ml2 of C12 with respect to the 
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C12 




Figure 2: Orientation of common line C12 relative to the Cartesian axes (X,Y) in 
central section PI. 

(X,Y) axes in the plane of PI is given by 

mi2 = tan ($12) (1) 
and hence the Euler angle 

$12 = arctan {11112) (2) 

can be determined if the common line C12, and hence its gradient in the plane of PI 
relative to the Cartesian axes (X,Y) can be identified. 

Now note that since C12 is the common line, it must also be contained in the 
plane of P2 after the Euler-angle rotations. Its orientation relative to the Cartesian 
axes (X',Y') in the plane of P2 is depicted in Fig. 3. Its gradient m21 relative to the 
axes (X',Y') in P2 is 

mai = -tan(^12) (3) 

and hence the Euler angle 

\1>12 = arctan (-m2i) (4) 

may be determined if the common line C12, and hence its gradient in the plane of 
P2 relative to the Cartesian axes (X',Y') can be identified in the diffraction pattern 
in that plane. 

6 
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Figure 3: Orientation of common line C12 relative to the Cartesian axes (X' ,Y') 
after the Euler-angle rotations in the diffraction patternP2 . 

Given two diffraction patterns, a pairwise numerical comparison {sinogram 
comparison, Franlc, 2006) of the intensity distributions along radial directions of the 
two patterns may be conducted. An automated criterion, such as an R-factor, moni- 
tors the degree of agreement. An exhaustive search is performed of all pairs of radial 
distributions of the intensities on the two patterns. A global minimum of the R-factor 
is assumed to determine the common line. For the diffraction patterns PI and P2 
above, denote the common line by C12. This gives estimates of the Euler angles $12 
and ^12. 

Fig. 4(a) and (b) show two simulated diffraction patterns (40x40 pixels) from 
random orientations of our test protein, Chignolin. The maximum lateral wavevector 
in the direction of the x-axis was 10 times the Nyquist frequency for the assumed 
lateral extension of the protein (16 A). This corresponds to a reciprocal-space length 
of g = 27r(10)/16 = 3.93 A~^. The wavevector, k, of 124 keV hard X-rays is about 
63 A~^. The scattering angle corresponding to the middle of an edge of the square 
diffraction pattern was calculated from 2 arcsin(g/2A;) = 3.2°. The central part of 
each diffraction pattern contains high intensities of relatively low detail, but several 
orders of magnitude stronger than in the outer parts of the pattern containing the 
high-resolution structural information. A numerical search for the common lines 
between the two patterns of Fig. 4 was performed by pairwise comparisons of the 
radial intensity distributions from the two patterns in angular steps of 1°, excluding 
the pixels within a central high-intensity disc of 7-pixel radius corresponding to a 
scattering angle of ~ 1°. Effectively, the values of the azimuthal Euler angles $12 
and ^^12 were identified by a contour plot of the form shown in Fig. 5 The identified 
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Figure 4: Identification of the common line in two typical simulated diffraction pat- 
terns from a model of the protein Chignolin, leading to a determination of the az- 
imuthal Euler angles $ and \1/ relating the 3D orientations of the diffraction patterns. 

common lines are also shown in Fig. 4(a) and (b). Due to the Friedel Law degeneracy 
mentioned above, any 180° range of azimuthal angles would be expected to contain 
such a minimum. For convenience, we perform numerical searches for $12 and \1/12 
angles over an azimuthal angle range of to 180°. Then Friedel's Law suggests equally 
valid values for these angles of $12 + 180° and \E'12 + 180°, respectively. Without 
phase information, it is impossible to tell from the diffraction data alone, which of 
the two values of each angle is "correct" . 

In the case of the flat Ewald sphere considered here, it is not possible to 
determine the Euler angle G12 between the normals to these planes with the data 
in the diffraction patterns PI and P2 alone. In order to determine that angle, it is 
necessary to have diffraction data in at least one more distinct reciprocal-space plane, 
which intersects the planes of PI and P2 along two further distinct common lines. 

2.2 Determination of all nine Euler angles relating three gen- 
eral central sections 

Let P3 denote a third diffraction pattern (Fig. 6). Since each diffraction pattern 
forms a central section through reciprocal space, each pair of diffraction patterns 
intersect along a common line, with the three common lines intersecting at the origin 
(O in Fig. 6). Denote the Euler angles specifying the transformation of the plane of 
P2 to that of P3 by (<l>23,e23,^'23), and those transforming plane of P3 to that of 
PI by ($31,031,\I^31). In the notation of Fig. 6, the common line between PI and 
P2 is denoted by OC, that between P2 and P3 by OA and that between P3 and PI 
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Figure 5: Contour map of sinogram comparisons between the two diffraction patterns 
of Fig. 4 in the vicinity of the global minimum at $=55°, '^=21° . 

by OB, with A, B, and C representing points on the surface of a unit sphere centered 
on O. 

By analogy with the method described in the last section, a comparison of the 
diffraction intensities of P2 and P3 can determine the Euler angles $23 and \E'23. 
Likewise comparison of the data of P3 and PI can determine the angles $31 and 
\&31. This leaves only three angles to be determined: G12 between PI and P2; 023 
between P2 and P3; and 931, between P3 and PI. 

The geometrical construction of Fig. 6 shows that the remaining Euler angles 
are the vertex angles lACB, IBAC, and ICBA of the spherical triangle ABC on 
the surface of the unit sphere. Also note that the lengths of the sides of this spherical 
triangle (the arcs CB, BA, and AC) are equal to the sums of angles \E'3H-$12 = 0:312 
(say), ^^12 + $23 = 0123 (say), and \E'23 + $31 = 0:231 (say), respectively (expressed 
in radians). (For example, if we consider a transformation from plane 3 to plane 1 
followed by one from plane 1 to plane 2, then the third Euler angle in the former 
transformation (\l/31) and the first Euler angle in the latter transformation ($12) 
involve rotations in the same plane, that of PI.) 

The cosine rule of spherical trigonometry gives 
cos (AB) = cos (CA) cos (CB) + sin (CA) sin (CB) cos (lACB) (5) 
that is, 

cos 0231 = cos 0123 cos 0312 + sin 0123 sin 0312 cos (012) (6) 
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Figure 6: Geometrical construction for determining the relative Euler angle 012 be- 
tween diffraction patterns PI and P2, given the six Euler azimuthal angles $ and \1/ 
relating PI, P2 and P3. 



and thus 



912 



arccos 



cos 0231 — cos ai23 COS a3i2 



smai23 sma3i2 



(7) 



This expression was obtained by Goncharov et al. (1987) by a different argument. 

Generahzing this result for a triplet of diffraction patterns i, j, and k, the 
angle Qij between i and j, is given by 



Qij = arccos 



cos ajki - cos aijk cos akij 



sm ai^k sm 



where 



Oiijk = % + ^jk, 



Oikij = ^ki + ^ 



(9) 
(10) 



and 



Oljki = *ifc + $ 



ki 
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Pair 


$ 




(1,2) 


16.0 


108.0 


(2.3) 


173.0 


174.0 


(3,1) 


121.0 


132.0 



Table 1: $ and Euler angles (in degrees) relating diffraction patterns from three 
random orientations of the molecule, as determined by numerical sinogram compar- 
isons. 

with k a third plane. 

The above analysis shows that, provided the Euler angles $ and ^ specifying 
the directions of common lines between any three sets of diffraction patterns i, j, and 
k are determined (e.g., by comparisons of sinograms from the diffraction patterns), 
the Euler angles © about the "hinge axes" formed by the common hues amongst 
those diffraction patterns can be deduced by the analytic formula (8). 

2.3 Removal of the ambiguities due to Friedel's Law 

As pointed out in the previous section, the Euler angles $ and \& may be determined 
from the diffraction pattern data only to modulo 180°, due to Friedel's Law. Thus, 
in addition to initial values (in the range to 180") assigned to these angles by the 
automated numerical sinogram comparison, one must also consider as possible values 
of these angles, $ + 180" and ^ + 180°, respectively. The possibility of two values 
for each of the three $ angles and two for each of the three \& angles, implies four 
possible values of each of the a angles in expressions (9), (10), and (11). Since three 
distinct a angles enter into the formula (8), there are 4^=64 possible values of Qij for 
a given set of $ and ^ angles deduced from three different diffraction patterns. 

In fact, this is not the case. Many combinations of $ and give rise to the same 
Qij, and a large number of combinations result in arguments of the arccos function 
in (8) outside the range of -1 to 1, giving no geometrically meaningful solution at all. 
This eliminates all but two sets of the three angles. The remaining ambiguity is 
due to the well known enantiomctric ambiguity of molecular structures that give rise 
to the same diffraction intensities. This ambiguity is impossible to resolve from the 
diffraction data alone. An arbitrary but consistent choice of one of the two sets of © 
angles produces one of the two enantiomers of the structure. 

A concrete example from a simulation of three diffraction patterns, 1,2, and 3 
from random orientations of the same molecule is illustrative. The $ and ^ angles of 
Table 1 were determined by numerical comparisons of sinograms of the three patterns. 

Substituting all 64 combinations of $ and and ^ and ^-I-tt into Eqs.(8)- 

(11) results in 46 combinations with values for the cosine of the relevant angle © lying 



11 



Pair 


e 


(1,2) 


114.2 


(2,3) 


144.4 


(3,1) 


103.4 



Table 2: One set of three (hinge) angles © (in degrees) between the three diffraction 
patterns oriented in 3D reciprocal space, deduced from Eq. (8). 



Pair 


e 


(1,2) 


19.1 


(2,3) 


12.1 


(3,1) 


159.5 



Table 3: Another set of values of the same angles (in degrees) as in Table 2, 
as determined by the same method. This solution corresponds to the enantiomer 
structure. 

outside the range -1 to 1. Nine of the 64 combinations give rise to the © angles in 
Table 2. 

Another 9 combinations give rise to the values for the © angles in Table 3. 

It turns out that the two sets of values of the © angles determined by this 

method correspond to the two enantiometric solutions referred to above. Thus the 
method described rules out Friedel pair combinations of common-line directions that 
are unphysical, producing just the two enantiomers consistent with the diffraction 
data. 



2.4 Averaging and Self-Consistency Checks 

A particular angle ©y may be estimated from (8) by taking as the third diffraction 
pattern k any one of the N — 2 other diffraction patterns. Each choice of third 
diffraction pattern will yield two possible (usually widely separated) values of ©jj, 
corresponding to the two possible enantiomers. This time, since we have already 
chosen an enantiomer in our previous estimate of ©jj using a different third diffraction 
pattern, we choose the solution that is closest to the previously selected value of ©jj, 
i.e., the same enantiomer. The finally assigned value of this angle will be the average 
of these values computed by (8) via all possible third planes k, namely: 



© 



1 



N -2 



E 



arccos 



COS ajki - cos akij cos ajjk 
sin akij sin aijk 



(12) 
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The calculations of the 6 angles via (12), including the tests of enantiomeric 

consistency, are very rapid. The bulk of the computational time involves the sinogram 
comparisons for diffraction pattern pairs. The time for these computations scales as 
the total number of pairs amongst diffraction patterns, namely, N{N — l)/2. To 
save computational time for the 630 diffraction patterns, we divided them into sets of 
about 10 diffraction patterns each. So long as two diffraction patterns are common to 
each of these sets of about 10, the method determines the relative orientations of all 
diffraction patterns relative to these two for a given enantiomer, with a computational 
time saving of a factor of approximately (630/10)^ ~ 4000. 

A method of sinogram matching determines common-line directions by com- 
parison between pairs of projections/ diffraction patterns at a time. Farrow and Ot- 
tensmeyer (1992) have suggested a method of simultaneously taking account of data 
from all available projections by means of quarternion mathematics. We propose here 
an alternative method of ensuring that all determined Euler angles are consistent with 
the data of all available diffraction patterns. With noisy data, such a self-consistency 
condition may even help reduce some of the errors due to noise. Consider any three 
noncoplanar diffraction patterns, i, j, and k. Then 

where R is the 3D rotation matrix, which transforms plane i to plane j in 3D reciprocal 
space, and / is the 3D unit matrix. Since R{^jk, Ojk: ^jfc)"^ — R{^kj: ©fci, ^fcj); ^^'^ 
R{^ki: ©fei, ^ki)'^ = R{^ik: ©ifc, *ifc), Eq.(13) may be rewritten 

R{^ij, Qij, ^ij) - R{^ik, Qik, ^ik)R{^kj, Qkj, ^kj), V/c. (14) 

If the $ and ^ angles on the RHS of (14) have been found by sinogram matching, 
and the © angles on the RHS via Eq.(12), Eq. (14) may be used to update the Euler 
angles Qij, and on its LHS. Since, for given planes i and j, there are N — 2 
other planes k, these angles may be calculated independently from N — 2 equations 
of the form (14), and the values averaged. A different pair of planes ij can then be 
selected and the procedure repeated to update the Euler angles on the LHSs of (14) 
relating all pairs of planes. 

To summarize this section, we have described a detailed procedure for ori- 
entating in 3D reciprocal space, a large number of diffraction patterns from random 
unknown orientations of an object without any knowledge of the structure of the object. 
The recovery of a molecular electron density from such data requires the determina- 
tion of the phases associated with these intensities. This may be done by the method 
of oversampling (Miao et al, 2001), involving iterative Fourier transformations of the 
data from reciprocal to real space, and applications of appropriate constraints in each 
of the spaces. A conventional fast Fourier transform (FFT) algorithm (Cooley and 
Tukey, 1965) requires data on a regular Cartesian grid in each space. Thus, it is 
necessary to perform a gridding operation in 3D reciprocal space to prepare the data 
for such an iterative phasing algorithm. 
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Figure 7: 2D representation of the 3D gridding process. Data from regular grids on 
randomly oriented central sections are interpolated onto a regular rectilinear 3D grid 
convenient for a fast Fourier transform routine. 

3 Forming a regular 3D diffracted intensity grid 
from randomly inclined central sections 

We perform this 3D gridding operation by means of the MATLAB routine, grid- 
data3. This routine fits a hypersurface of the form w = f{x,y,z) to the irregularly 
spaced data from the randomly inclined central sections in reciprocal space, using 
a tessellation-based linear interpolation, which incorporates the method of Delaunay 
triangulation (Delaunay, 1934). The density of the uniform 3D grid points was chosen 
to ensure oversampling with respect to the Nyquist criterion for an object of the size 
of our test molecule. For the purposes of our present simulation, where the small test 
protein Chignolin is known to be smaller than a cube of linear dimension 16 A, we 
take a reciprocal space sampling corresponding to the Nyquist frequency of a cube of 
double this linear dimension, namely 32 A. That is, sampling frequency of the uniform 
rectilinear 3D reciprocal space grid is twice the Nyquist frequency corresponding to 
the diameter of the object in each of the three linear dimensions. 

As the test was performed on simulated data, the efficiency of the determi- 
nation of the relative orientations of the simulated diffraction patterns and of the 
gridding algorithm could be evaluated by comparing the diffraction data on the final 
uniform 3D Cartesian grid with diffraction intensities calculated directly on the same 
grid from the PDB atomic data. The usual X-ray R-factor was used to compare the 
two datasets. For our simulation of 630 diffraction patterns from the protein Chig- 
nolin, we obtained an R-factor value of 0.04, indicating a high fidelity for the the 
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orientation and gridding process. 



4 Phasing of the diffraction data and the recovery 
of the 3D molecular electron density 

The determination of the phases associated with the gridded diffraction data, and 
hence the 3D molecular electron density was performed by a combination of an it- 
erative oversampling algorithm (Miao et al, 2001), which successively imposes con- 
straints/modifications of the electron density in real space through object domain 
operations (ODO) (Ficnup, 1978; Oszlanyi and Siito, 2003) and in reciprocal space 
(Oszlanyi and Siito, 2004). 

The 3D Fourier transform of the gridded diffraction intensities yields the 3D 
autocorrelation function of the molecular electron density. Since the extent of the 
autocorrelation map is twice that of the electron density map, the approximate spatial 
extent of the molecular electron density can be found directly from the diffraction 
intensities (Marchesini et al, 2003). 

A flow chart and pseudo code of our iterative phasing algorithm is shown in Fig. 
8. The square roots of the gridded diffraction intensities are assumed proportional to 
the protein structure factors Fq, say, where a reciprocal-space vector q is defined by 

q = /ibi + kh2 + Ihs (15) 

where the unit vectors bj (i = 1, 2, 3) of the reciprocal space are defined by the usual 
relationships 

hi ■ BLj = Sij (16) 

with respect to real-space unit vectors so chosen as to define a 3D volume expected 
to contain the molecule. Since the phases associated with these structure factors are 
initially unknown, we begin by assigning random phases to those structure factors Fq 
corresponding to values of the Laue index I > 0. Assumption of Friedel's Law, 

F_q = F* (17) 

then allows the assignment of complex structure factors for / < 0. An (inverse) 
FFT algorithm calculates an initial 3D electron density distribution, whose reality 
(in the mathematical sense) is assured by the above Friedel relationship amongst the 
structure factors. In general, the computed electron density is spread over a real-space 
volume larger than that of the molecule. 

A support constraint is now applied in real space by setting to zero the electron 
density outside the volume expected to be occupied by the protein (Fienup, 1978). In 
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If pj inside support 
If pj':"><thresholdl. 

Pj 

Else 

Endif 
Else 

Eiidif 



If |Fq^">|>threshold2 
Else 

F^(n)=p^Cn)exp[lJl/2] 

Endif 



OS2 



Start: n=0 



{p,^">}-FT->{K^>} 



Figure 8: Flow chart and pseudo code of the iterative phasing algorithm 



addition, the electron density within the expected volume of the protein is modified 
according to the charge flipping prescription of Oszlanyi and Siito (2003) (which 
was shown by Wu et al. (2004b) to be a special case of Fienup's (1982) output- 
output algorithm with feedback parameter (3 = 2). According to the charge fiipping 
prescription, electron density values that exceed a certain threshold 5 are unmodified, 
while the signs of those below this threshold are reversed. The value of this threshold 
is chosen to optimize the progress of the algorithm, as monitored by an R-factor 
between the griddcd "experimental" structure factors and those calculated from a 
Fourier transform of the electron density recovered by the algorithm. (The value for 
5 taken in practice was typically around 10% of the maximum electron density.) A 
Fourier transform of the modified electron density specifies the same distribution in 
reciprocal space. The continued reality of this modified electron density ensures the 
resulting calculated structure factors have phases satisfying Friedel's Law. 

A different threshold is employed to divide the reciprocal-space amplitudes 
into strong and weak reflections. The magnitude of the threshold amplitude was again 
monitored by the same R-factor as for the real-space threshold above. The optimum 
division was found when 55% of the weakest reflections were classifled as weak. A 
reciprocal-space constraint is applied to the strong reflections: their amplitudes (or 
moduli) are replaced by the square roots of the corresponding measured intensities, 
while retaining the phases from the Fourier transform operation. As for the weak 
reflections, their moduli are left unchanged, but their phases are shifted by 7r/2. 
The resulting set of complex structure factors is then subject to an inverse Fourier 
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Pair of Planes 


Recovered/ Actual $ 


Recovered/ Actual 


Recovered/ Actual 


(0,1) 


64.0/64.3 


148.2/144.6 


48.0/48.3 


(0.2) 


16.0/20.8 


18.7/20.6 


2.0/177.6 


(0.3) 


18.0/18.7 


79.3/81.8 


90.0/90.1 


(0,4) 


144.0/140.5 


40.8/43.5 


118.0/122.4 


(0,5) 


16.0/13.9 


14.2/16.0 


108.0/110.3 


(0,6) 




Not found 




(0,7) 


174.0/174.2 


138.9/138.2 


100.0/99.6 


(0.8) 


90.0/90.0 


92.3/87.9 


168.0/169.7 


(0,9) 




Not found 




(0,10) 




Not found 




Mean error 


1.7 


2.5 


2.0 



Table 4: Relative orientations of copies of a single molecule, as specified by a set of 
Euler angles $, 0, and ^, and the same angles recovered by the identified common 
lines between pairs of the diffraction patterns (reciprocal space planes) labeled to 
10, and analytical formulae described in the text. Also shown are the mean absolute 
errors in the determinations of these angles (all angles specified in degrees). 

transformation, which yields another real-space electron distribution. This is modified 
in the same way as before, and the whole process repeated for several iterations. 

This algorithm constrains the solution to be consistent with the measured 
intensities of the strong reflections in reciprocal space, and to the expected size of the 
object in real space. Subject to these constraints, it allows a thorough exploration 
of configuration space by iteratively modifying the phases of the weak reflections in 
reciprocal space and the signs of the small electron densities in real space. 

5 Results for Noise-Free Simulations 

We have tested the effectiveness of this algorithm on a set of 630 simulated diffraction 
patterns computed out to about 1 A resolution from random orientations of the small 
synthetic protein Chignolin, simulated from the atomic elements and coordinate data 
taken from the Protein Data Bank, and atomic scattering factors calculated from the 
relevant Cromer-Mann coefficients (Cromer and Mann, 1968). 

We then employed our common-line method to determine the Euler angles 
specifying the relative orientations of each of the simulated diffraction patterns. A 
typical comparison of the recovered angles with the known angles from the simulations 
is shown in Table 4. 

Occasionally the common-line search (section 2) does not succeed in accurately 
finding the Euler angles $ and ^ relating a pair of diffraction patterns. If an angle 
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is calculated from such inexact values, the calculated argument of the arccosine in (8) 
may not lie in the range +1 to -1, and thus may not yield a value for the Euler angle 
O. As shown by Table 4, this is the case for the angle relating diffraction patterns 
(0,9) and (0,10). In such cases, we simply ignore the data in diffraction patterns 9 and 
10. Proceeding this way, we were able to determine self-consistent solutions for the 
orientations of 401 out of the 630 diffraction patterns simulated. For our sample of 11 
diffraction patterns, the mean accuracy of the Euler angle determination is about 2°. 
We used the data of the 401 correctly oriented diffraction patterns to assign intensities 
to an irregularly-spaced set of points in reciprocal space. 

Is this sufficiently accurate? In order to answer this question one has to ask 
how accurately one needs to determine these angles to correctly assign intensities in 
each of the points of an oversampled 3D reciprocal-space grid. The required angular 
accuracy is thus determined by the angular extent of a reciprocal-space voxel of the 
highest resolution subtended at the origin of reciprocal space. Since the width of a 
reciprocal-space voxel is 1/(2L) (Huldt, et ai, 2003), where L is a linear dimension of 
the molecule investigated, the angular resolution required is l/(2L)/(l/i?) = R/{2L) 
radians, where R is the required resolution. For the example of the small protein 
modeled here, taking L=15 A and R—1 A , we may deduce that the required angular 
resolution is about 1/30*'* of a radian, or about 2°. Table 4 shows that this is achieved. 

Application of the gridding algorithm of section 3 produced a set of diffrac- 
tion intensities on such a uniform grid of points in 3D reciprocal space. Subsequent 
application of the iterative phasing algorithm of section 4 recovered the electron den- 
sity distribution in the lower panel of Fig. 9 in about 65 iterations of the phasing 
algorithm. 

For purposes of comparison, we also simulated the complex structure factors 
(amplitude and phase) on the same oversampled 3D grid of reciprocal-space points 
as used in the iterative phasing algorithm. An inverse Fourier transform of these 
(correct) complex structure factors recovered the protein electron density distribution 
in the upper panel of Fig. 9 at a resolution consistent with the extent of the diffraction 
data. 

The recovered electron density is in reasonable agreement with that of the 
starting model, with a correlation coefficient of 0.7 between the two electron density 
distributions. 

6 Effect of Shot Noise in Measured Diffraction Pat- 
terns 

Even with radiation from an ultra-bright source such as an XFEL, the expected 
number of detected photons per pixel of a diffraction pattern from a single biomolecule 
is expected to be very small. Therefore, it is important to investigate the robustness 
of any algorithm to shot noise. We do this by assuming different mean photon counts 



18 



Figure 9: Electron density of protein Chignolin (PDB Entry: lUAO) to about 1 A 
resolution. Upper panel from PDB model. Lower panel from from multiple diffraction 
patterns of molecule in random orientations. The secondary structure is clearly visible. 

per pixel Jq in the high resolution (or high-q) part of the diffraction pattern. If Jq is 
the expectation value of the photon count at any particular pixel, the actual number 
/ of detected photons is determined by the Poisson distribution 

P{Ilh) = |e-^° (18) 

where p{I / Iq) is the probability of measuring / photons. By comparing with the 
noise-free simulations, we investigated the effectiveness of the common-line algorithm 
in determining the relative Euler angles of the same diffraction patterns 0-10 of Table 
4 for mean photon counts per pixel Jq = 100 and Jq = 10. 

The results of Table 4 were almost perfectly reproduced for Jq = 100, but 
there was substantial deterioration of the fidelity of the determined Euler angles for 
a mean photon count of Jq = 10, (Table 5). 

In the same subset of 10 diffraction patterns, the algorithm was able to deter- 
mine just 4 sets of relative Euler angles out of 10, with a mean angular accuracy of 
about 3.5°. We stated earlier that the required angular resolution is R/{2L) = 2°, 
for 1 A resolution. This may be relaxed to about 4°, if 2 A resolution is accepted. 
However, the fact that the orientations of less than half the diffraction patterns could 
be determined suggests that a mean detected photon count /pulse/pixel of 10 is close 
to the practical lower limit for the direct use of a common-line approach. Of course, 
our current simulations were performed for a small protein, and we have not explic- 
itly tested the dependence of this limit on protein size. However, it is of interest to 
note that a similar limit of counts per pixel is typical for cryo-electron microscopy of 
biological entities. 
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Pair of Planes 


Recovered/ Actual $ 


Recovered/ Actual 


Recovered/ Actual 


(0,1) 


68.0/64.3 


150.6/144.6 


52.0/48.3 


(0.2) 


14.0/20.8 


18.1/20.6 


4.0/177.6 


(0.3) 


18.0/18.7 


8.5.(3/81.8 


92.0/90.1 


(0,4) 




Not found 




(0,5) 




Not found 




(0,6) 




Not found 




(0,7) 


174.0/174.2 


144.3/138.2 


100.0/99.6 


(0.8) 




Not found 




(0,9) 




Not found 




(0,10) 




Not found 




Mean error 


2.8 


4.6 


3.0 



Table 5: Comparison of the determination of the relative Euler angles of the same 11 
diffraction patterns as for the noise-free case of Table 4 for noisy diffraction patterns 
with mean photon count of 10 photons/pixel, with the (shot) noise modeled by a 
Poisson distribution 

The significance of these results becomes apparent on comparing these values 
of Tq with the estimated values of the same quantity under the usual assumptions of 
the incident beam flux from an XFEL for two different values of the focussed beam 
diameter D, as shown in Table 6. 

In compiling this Table, we assumed that the molecule consists of Natom non-H 
atoms (for the present purpose modeled as C atoms) . We also distinguished between 
small-q and largc-q scattering (where q is the scattering-induced momentum change 
of an incident photon) for the following reason. There is a large difference between the 
expected photon count for small-q and for high-q scattering. Put simply, all electrons 
in the sample scatter more or less in phase in the low-q regime, thereby giving rise 
to a scattered intensity proportional to N'^, where N is the number of electrons in 



E(keV) 


A(nm) 


ac (mm^/str. xlO ^^) 


D (/im) 


W (photons 
/mm^ /pulse) 


n (ph/pulse/pixcl) 


Small q 


Large q 


Small q 


Large q 


12.4 


0.1 


2.87 


0.26 


1 


2.6xl0^« 


50 


1.4x10-* 


0.1 


2.6x10^^ 


5000 


1.4x10-^ 



Table 6: Expected counts of detected photons/pulse/pixel for both small-q and large- 
q scattering by a 500 kDa protein with an XFEL source. E represents the photon 
energy, A its wavelength, the typical differential scattering cross-sections for a C 
atom for small/large q, D the assumed diameter of a focussed beam incident on the 
sample, W the photon ffuence, and n the estimated scattered photon count per pulse 
per detector pixel. 
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the sample (~ ZNatom, with Z the average atomic number, and Natom the number of 
atoms), while in the high-q regime, the scattered intensity is proportional to N . 

Values for the differential scattering cross section of 12.4 keV X-rays by a C 
atom for small q and large q scattering were taken from the tables on elastic photon- 
atom scattering posted at the web site of the Lawrence Livermorc National Laboratory 
{http://www-phys.llnl.gov/Research/scattering) . Taking the effective width of a pixel 
as Ak — 2tt/ (2L) (Huldt et al, 2003), where L is a linear dimension of the molecule, 
imphes a reciprocal space pixel area of {AkY — 47r^/ (4L^). The sohd angle subtended 
at the sample by each pixel is then 

n = (47rV4L2) /k^ = AV4L2 str. (19) 

where k is the wavenumber of the radiation, and A the wavelength. If a is the average 
spacing of non-H atoms, we may take (L/a)^ ~ Natom, or L ~ clNK^. Substituting 
this value for L in (19), we deduce 

^ ~ (20) 
For high q, the measured photon count per pulse per pixel (n) is estimated as 

n ~ nNatomWac = ^NlS^Wac, (21) 

with a similar expression for low q, but with a N^iom dependence. Taking Natom = 
35,000 (corresponding to a protein of approximately 500 kDa molecular weight), 
values for A, crc, -D, and W given in Table 6, and a taken as 2 A, we deduce the 
values for n for small/large momentum transfer q shown in the right-hand columns 
of Table 6. 

It is important to note that, even for a focusscd beam diameter of 0.1 /im, the 
expected photon count per pixel for largc-q data (needed for high-resolution structure 
determination) is approximately 3 orders of lower than the level at which the common- 
line method is able to reliably find the relative orientations of the diffraction patterns. 

The estimates of Table 6 suggest that the photon counts in the low-q region of 
a single diffraction pattern of a large protein may be high enough to render the effects 
of shot noise negligible. However, it is unlikely that structural information directly 
available from low-q data will yield anything more than the overall shape of the 
scattering object, as in the technique of small angle X-ray scattering (SAXS). It is an 
open question whether a coarse orientating of patterns, which may be performed with 
the low-q data, will help to orientate entire diffraction patterns sufficiently accurately 
to exploit the high-q data for high-resolution structure determination. 
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7 Discussion 



The ability to record and sort 2D diffraction patterns from individual molecules is 
important for a number of reasons. First and most obvious is the elimination of the 
need for crystals. Second, and in our view equally important, is the potential to 
sort and separate diffraction patterns from different molecules or different molecular 
conformations in the beam prior to structure recovery. The fact, for example, that 2D 
diffraction patterns from different molecules do not have common lines might allow 
the diffraction patterns to be separated into sets before further analysis, with each 
set designating a different type of molecule or molecular conformation. 

This paper has been concerned with developing an algorithm to determine 
the structure of a single scattering entity (such as a protein, or nanoparticle) from 
multiple diffraction patterns due to scattering from unknown random orientations of 
identical copies of the object. We have shown that an adaptation of a "common-hne" 
algorithm from 3D electron microscopy/tomography is able to accomplish this task 
for noise-free diffraction patterns in the fiat Ewald sphere limit. There is little doubt 
that an extension of such a method to curved common-lines will similarly enable 
structure determination from low-noise diffraction patterns at ~ 1 A wavelengths 
characteristic of currently planned XFELs (Hajdu et al, 2000; Abela et al., 2007). 

Of much greater concern is that fact that, even with the most powerful XFELs 
currently envisaged, the expected number of scattered photons per high-q pixel of 
a molecular diffraction pattern from a single radiation pulse is far too low for the 
alignment approaches proposed so far. The common-line method relies on identifying 
similar intensity distributions along single lines in two low-intensity (and thus high- 
noise) diffraction patterns. As such, it is hardly surprising that it is very sensitive to 
noise. Our conclusion is that such a method requires a mean photon count of at least 
10 per pixel in the high-q region of a diffraction pattern, about 3 orders of magnitude 
greater than expected from a proposed experiment with an XFEL (Table 6). 

We note that in the proposed experiments, the minimum photon count per 
diffraction pattern orientation is not determined by the minimum required to recon- 
struct a satisfactory 3D image of the object from projections of known orientations, 
as in conventional tomography, but rather by the need for correct classification and 
assembly of a 3D diffraction volume from data in diffraction patterns alone. The min- 
imum photon count in the former case may be quite low, since the dose fractionation 
theorem for 3D electron microscopy/tomography (Hegerl & Hoppe, 1976; McEwen, 
Downing, & Glaeser, 1985) states that "A three-dimensional reconstruction requires 
the same integral dose as a conventional two-dimensional micrograph provided that 
the level of significance and resolution are identical" . This suggests that if there are M 
projections (or in our case, diffraction patterns) the photon count per pixel required 
for an equally successful 3D reconstruction will be just 1/M of that for a single pro- 
jected image (in our case a single diffraction pattern). In the absence of orientational 
information, this theorem does not help, because a much higher photon count (about 
10 photons/pixel per orientational class) is needed for the successful assembly of a 
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3D diffraction volume suitable for structure solution. In short, the minimum photon 
count for correct classification and orientation is much higher than that needed for 
structure recovery of a single biomolecule. 

We now consider the possibility of classifying measured difi^raction patterns 
into sets of similar orientations, and averaging their intensities to improve their signal- 
to-noise ratios. Bortel and Faigel (2007) find that successful classification of measured 
diffraction patterns of a protein modeled by 35,000 C atoms requires an incident 
photon fluence of 10^^ m~^/pulse = 10^^ mm~^/pulse. Comparison with our Table 6 
shows that this is 100 x the fluence expected from an XFEL beam focused down to 
a 0.1 /xm diameter spot. Even if a more efficient method of classification were found. 
Table 6 indicates that the number of photons expected per high-resolution pixel of 
the diffraction pattern is ~ 10~^, approximately 3 orders of magnitude smaller than 
that needed for the common-hne method. This suggests the need for the summation 
of the data from about 1000 diffraction patterns per orientational class. Since Bortel 
and Faigel also find that at about 10^ classes are required for faithful recovery of the 
structure of such a molecule (assumed to be 100 A in diameter) to 3 A resolution, 
it will be necessary to measure ~ 10^ diffraction patterns. Assuming photon pulse 
and read-out rates of 100 Hz, this would require ~ 10^ seconds, or several months of 
continuous beam time for a single experiment. 

We note that the classification problem is not eased for an even larger scat- 
tering object, such as a virus or nanoparticle. Eq. (21) suggests that the number 

1/3 

of photons per detector pixel varies as NJ^^. Thus, a scattering entity modeled by 
-^atom=10^ C atoms, would scatter 3 times as many photons into each pixel. How- 
ever, Bortel and Faigel (2007) estimate the number of required orientational classes 
for the structure solution of such an entity (assumed to be of 300 A diameter) to 3 
A resolution is 2x10^, an extra factor of about 30 or so over the 500 kDa protein. 
Thus the total number of diffraction patterns required, and hence time for data col- 
lection, would be expected to increase to 10^ seconds, i.e. several years for a single 
experiment. 

Assembling a 3D intensity distribution from the low-intensity diffraction pat- 
terns from single molecules obtainable from single pulses of an XFEL may not be 
practical with the common-line method, the only approach mentioned in the litera- 
ture so far (sec e.g. the Technical Design Report of the European XFEL, Abela et 
ai, 2007). This calls for the development of entirely new algorithms that perform 
structure solution by simultaneously acting on all the data of all measured diffraction 
patterns. Two such approaches have been suggested by the present authors (Ourmazd 

et al, 2007; Saldin et al., 2007) and will be the subject of forthcoming publications 
1 

^An alternative approach has been proposed by Spence et al. (2005) for improving the signal-to- 
noise ratio, in which the orientational alignment of the molecules is controlled by means of crossed 
laser beams. 
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8 Conclusions 



We have presented the first demonstration of an integrated algorithm to determine 
the electron density of a particle or large biomolecule, such as a protein, from a 
collection of 2D diffraction patterns, each from a molecule in an unknown random 
orientation, as expected from the proposed X-ray scattering experiments with XFEL 
sources. The method involves first determining the relative orientations of the differ- 
ent 2D diffraction patterns, interpolating the data onto a regular 3D Cartesian grid in 
reciprocal space at a sampling rate higher than the Nyquist frequency for the size of 
the molecule, determining the phases associated with the measured amplitudes, and 
hence deducing the 3D electron density of the molecule or nanoparticle. There are 
significant differences with similar algorithms developed previously for 3D electron 
microscopy, due to the absence of direct phase information, and the ambiguities due 
to Friedel's Law. We have shown how these difficulties may be overcome, even in the 
limit of a flat Ewald sphere, by the imposition of appropriate consistency conditions. 
These enable the determination of the relative orientations of the diffraction patterns, 
and hence the molecular structure, to within the usual enantiomeric uncertainty. 

We have tested the algorithm with a computer simulation for a model protein, 
at an X-ray wavelength short enough to justify the flat Ewald sphere approximation, 
with and without Poisson noise for the detected photons. Adaptation of this algorithm 
to take account of curved common lines can readily extend the applicability of this 
approach to longer X-ray wavelengths. 

Our simulations have highlighted an important limitation of a common-line 
method for finding the relative orientations of diffraction patterns from random ori- 
entations of a sample. Such methods depend on comparing the intensity distributions 
along particular lines in two diffraction patterns, thus using only a very small fraction 
of the available data for each orientation determination. They are consequently very 
sensitive to noise. 

We find that the common-line method ceases to work reliably for mean photon 
counts per pixel below about 10 in the high-q part of a diffraction pattern. These 
regions contain the high-resolution information needed to resolve the secondary struc- 
ture of a protein. Since the scattering by a typical (500 kDa) protein of a pulse from a 
planned XFEL beam focused to a spot of 0.1 iim diameter is expected to produce some 
1000 X fewer photons per detector pixel, the use of a common-line method would seem 
to necessitate the classification and averaging of at least 1000 low-intensity diffraction 
patterns per oricntational class to correctly assemble the scattered intensity distribu- 
tion in 3D reciprocal space. 

The method of classifying diffraction patterns into oricntational classes exam- 
ined by Bortel and Faigel (2007) requires at least 100 x the anticipated XFEL fluence. 
Even if superior classification methods were devised, the determination of the struc- 
ture of a 100 A-wide molecule to 3 A resolution would require about lO*' oricntational 
classes (Bortel and Faigel, 2007). Assuming pulse and read-out rates of lOOHz, data 
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collection for a 500 kDa protein would require several months of continuous beam 
time. 

We thank Veit Elser, Leonard Feldman, Paul Fuoss, John Spence, and Brian 
Stephenson for helpful discussions. 
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